Principal Component Analysis

Principal component analysis (PCA) will be used as a dimensionality reduction technique to find the over-arching dimensions that represent knowledge about social relationships. In this study, we will explore the dimensions that are revealed when we consider a comprehensive list of social relationships rated on a comprehensive list of dimensions from the previous literature on social relationship knowledge.

Import data

This dataset was collected from a survey hosted on mturk. The survey data was cleaned with a separate python script. A matrix was created for the average rating of social relationships on dimensions that are thought to characterize these relationships. The relationships list was created using lexical word vector tools to generate a list of all possible social relationships (159 in total). The dimensions were all of the previous dimensions that have been proposed in the literature.

Quantitatively selecting the number of components

PCA will output the same number of components as there are dimension inputs. As the components are ranked by how much variance they explain, we can exclude some components which do not add much additional information.

Parallel Analysis

We will use parallel analysis to indicate what the optimal number of components to include would be.

## Parallel analysis suggests that the number of factors =  NA  and the number of components =  4
## quartz_off_screen 
##                 2

## Parallel analysis suggests that the number of factors =  NA  and the number of components =  4

Parallel analysis indicates that having 4 components would be optimal.

Screeplot

PCA with no rotation is done here to visualize the amount of variance accounted for by each component.

## quartz_off_screen 
##                 2

PCA with varimax rotation

Rotations are used in principal component analyses to be able to better interpret the data. There are two main types of rotations, varimax and oblimin. Here, we will use varimax rotation, as it will maximize the component loadings so that dimensions are more strongly loaded onto a single component, rather than across components. Because of this, our resulting components may correlate with each other. Oblimin rotation results in components that are uncorrelated to each other.

## [1] "First four components account for 77.45% of the variance"

Component Loadings

Interactive Relationship Plots

Summary

RC1 = Formality
RC2 = Activeness
RC3 = Valence
RC4 = Exchange

We have three of the same components seen in the previous studies. In the present study, the third component, Profit, is new and describes a new feature space.


Supplementary Analyses

Relationship scores across all studies

We have correlated the relationship scores across all of the components for all of the studies. There is strong consistency, where the specific component of one study (i.e. valence) is strongly correlated with the same component from another study.

Study 3A and 3B Comparison

Here we will compare the results of study 3B, which explored the representational space of 25 relationships on 30 dimensions from the literature, and study 3A, which explored the representational space of 159 relationships on 30 dimensions from the literature. This analysis will show how the social relationships feature space can change based on the relationships that are sampled.

Component loading comparison

## [1] "Study 3A RC1 is most strongly correlated to Study 3B RC1 (rho = -0.7815, p = 1.70707067174018e-06)"
## [1] "Study 3A RC2 is most strongly correlated to Study 3B RC2 (rho = -0.2378, p = 0.204902091270491)"
## [1] "Study 3A RC3 is most strongly correlated to Study 3B RC4 (rho = -0.7081, p = 2.05003627706028e-05)"
## [1] "Study 3A RC4 is most strongly correlated to Study 3B RC3 (rho = 0.5555, p = 0.00170806975301264) and to Study 3B RC3 (rho = 0.5555, p = 0.00170806975301264)"
Study 3A Components Study 3B Components
RC1 Formality Formality
RC2 Activeness Activeness
RC3 Exchange Valence
RC4 Valence Exchange

The component loadings between the two studies are moderately to strongly correlated (> 0.50). However, there are difference between components that are named the same, indicating that the loadings have shifted due to a lack of relationship variety.

The correlation between relationship component scores are very strong. The “relationship-space” between the two studies is very similar.

Note: For both the loading comparison and the relationship score comparison, I used a Spearman correlation to account for difference in the value distributions of the two studies. For example, RC1 for Study 3B ranges from -20 to 20, but for Study 3B, it ranges from -2 to 2, even though these are the same component (Formality). Maybe I should norm the scores and then correlate them?

Next, we will take a closer took to see if the same components are highly correlated between studies (i.e. formality from study 3A and formality from study 3B)